.NET Debugging : Managed Heap and Garbage Collection

8/10/2011 9:11:16 AM

The garbage collector in CLR 4.0 has undergone some really exciting changes/additions primarily in the area of exposing additional diagnostics and the addition of a new garbage collection mode called background garbage collection.

Extended Diagnostics

The SOS debugger commands to gain insight into how the GC works and what information we can use to troubleshoot difficult application problems. With CLR 4.0, the SOS debugger extension has been extended to include a new set of commands that further aid in troubleshooting application problems related to the GC. In this section, we will take a look at these new commands and how they can be used.

VerifyObj

The first command of interest is the VerifyObj command, which has the following syntax:

!VerifyObj <object address>

The command takes an object address and checks the object for possible signs of corruption. The algorithm used to detect corruption is primarily in the area of making sure that the method table is intact both with the actual object and any contained objects. If you suspect that a heap corruption is rearing its head, the output of this command can serve as a quick indicator. Here is an example of a corrupt object and the output of the VerifyObj command:

0:000>!VerifyObj 0x02126804
object 0x2126804 does not have valid method table

FindRoots

Finding the reason why an object has not yet been collected can be a tedious process. Objects that have “simple” roots are relatively straightforward, but at times, an object’s root can be less than straightforward to spot. For example, if an object has a cross-generational reference to it and the referencing generation has not yet been collected, the object will still appear to be live and well. To make life easier when detecting these cross-generational references, the FindRoots command can be used:

!FindRoots -gen <N> | -gen any | <object address>

The FindRoots command instructs the runtime to set a breakpoint the next time a garbage collection occurs in the specified generation (using the gen <N> switch) or anytime a garbage collection occurs regardless of the generation (using the gen any switch). After the breakpoint hits, the FindRoots command can be fed an object’s address to display the roots for the object.

The first step in the process is typically finding the generation that the object belongs to by using the GCWhere command:

0:003> !GCWhere 00b08580
Address   Gen   Heap   segment   begin       allocated       size
00b0b400  0     0      00b00000  00b01000    00b0c010        0xc(12)

The output shows that the object at address 0x00b08580 belongs to generation 0.

Next, we use the FindRoots command to break on the next garbage collection that occurs in the generation and resume execution:

0:003> !FindRoots -gen any
0:003> g
(710.970): CLR notification exception - code e0444143 (first chance)
CLR notification: GC - Performing a gen 2 collection. Determined surviving
								objects...
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
eax=0013f118 ebx=00000000 ecx=00000000 edx=00000006 esi=0013f1dc edi=00000003
eip=7c812afb esp=0013f114 ebp=0013f168 iopl=0         nv up ei pl nz na pe nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00000206
KERNEL32!RaiseException+0x53:
7c812afb 5e              pop     esi

After the breakpoint is hit, we can use the FindRoots command with the object’s address to find out the roots of the object:

0:000> !FindRoots 00b0b400
Scan Thread 0 OSTHread 970
ESP:13fac8:Root: 01b01010(System.Object[])->
  00b0ab38(System.Collections.Hashtable)->
  00b0ab70(System.Collections.Hashtable+bucket[])->
  00b0b400(System.Int32)
Scan Thread 2 OSTHread acc
DOMAIN(0016CB98):HANDLE(Pinned):9713fc:Root: 01b01010(System.Object[])->
  00b0ab38(System.Collections.Hashtable)->
  00b0ab70(System.Collections.Hashtable+bucket[])->
  00b0b400(System.Int32)

HeapStat

The HeapStat command shows a nice and detailed breakdown of the used and free bytes for each generation on each managed heap. Additionally, it provides a summary view showing free versus used memory (as a percentage) on the small object heap (SOH) and large object heap (LOH). The syntax of the command is

!HeapStat [-inclUnrooted | -iu]

The default output shows all rooted objects. The inclUnrooted (or iu shortcut) switch can be used to include all rooted as well as unrooted objects. Here is an example of running the HeapStat command on the 05OOM.exe application:

0:004> !HeapStat
Heap      Gen0   Gen1   Gen2   LOH
Heap0  2166844 134200 159064 33328

Free space                              Percentage
Heap0  1865804     12     36    96      SOH: 75% LOH:  0%

The output shows both how much memory is in use: 2.1MB in Gen 0, 134KB in Gen 1, 159KB in Gen 2, and finally 33KB in the LOH. The free space indicates that there is 1.8MB available in Gen 0, and 12 and 36 in Gen 1 and Gen 2, respectively.

GCWhere

The process involved dumping out the managed heap segments (using the eeheap command) and then matching the address of the object to one of the segments listed in the output (output specifies which generation each segment corresponds to). This process may work fine if you’re only trying to find the generation of one or two objects, but any more than that and it becomes rather tedious. Fortunately, SOS 4.0 introduces a command called GCWhere that displays information about the object passed in as an argument. The syntax of the command is

!GCWhere <object address>

Here is an example of the output when ran against a FileStream object:

0:000> !GCWhere 0x01efd3f8
Address    Gen  Heap   segment    begin      allocated          size
01efd3f8   0    0      01ea0000   01ea1000   020f99cc       0x50(80)

The output shows the address of the object in question (0x01efd3f8), the generation to which the object belongs (Gen 0), the managed heap (0), the segment pointer (0x01ea0000), the starting address of the segment (0x01ea1000), number of bytes allocated on the segment (0x020f99c), and finally the size of the object (0x50). Please note that the size is not the recursive size (i.e., does not include the size of child objects).

ListNearObj

The ListNearObj command can be used to validate the consistency of the heap. The command takes an object address as an argument and attempts to validate both the object before and after the specified object. The syntax of the command is shown here:

!ListNearObj <object address>

For example, running the ListNearObj against an object that is valid and is surrounded by valid objects yields the following output:

 0:000> !ListNearObj 0x01efd3f8
Before:  01efd3c8   48 (0x30)   System.Collections.Hashtable+bucket[]
Current: 01efd3f8   80 (0x50)   System.IO.FileStream
After:   01efd448   28 (0x1c)   System.String
Heap local consistency confirmed.

The output is broken down into the before, current, and after followed by the result of the validation. The before, current, and after sections specify the object’s address, size, and type. In the preceding example, all three objects were considered valid and therefore the command considers the heap local consistency to be intact. If, on the other hand, we run the command against an object that is corrupted (where the size of the object has been overwritten), we see the following output:

0:000> !ListNearObj 0x01efd3f8
Before:  01efd3c8           48 (0x30)   System.Collections.Hashtable+bucket[]
After:   01efd448           28 (0x1c)   System.String
Heap local consistency not confirmed.

AnalyzeOOM

SOS 4.0 introduces a new command called AnalyzeOOM that helps in the out-of-memory diagnosis process. The syntax for the command is shown here:

!AnalyzeOOM

Let’s use a small application called 10OOM.exe to illustrate how the command can be used. The 10OOM.exe application simply sits in a tight loop and allocates large amounts of memory until the memory is exhausted. Run the application under the debugger until the out-of-memory exception is thrown:

(2b14.281c): C++ EH exception - code e06d7363 (first chance)
(2b14.281c): CLR exception - code e0434352 (first chance)
ModLoad: 75370000 75378000   C:\Windows\system32\VERSION.dll

Unhandled Exception: OutOfMemoryException.
(2b14.281c): CLR exception - code e0434352 (!!! second chance !!!)
eax=0030edc0 ebx=00000005 ecx=00000005 edx=00000000 esi=0030ee6c edi=003fa160
eip=75e242eb esp=0030edc0 ebp=0030ee10 iopl=0          nv up ei pl nz ac pe nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000              efl=00000216
KERNEL32!RaiseException+0x58:
75e242eb c9              leave

Next, we execute the AnalyzeOOM command:

0:000> !AnalyzeOOM
Managed OOM occured after GC #247 (Requested to allocate 1048576 bytes)
Reason: Low on memory during GC
Detail: SOH: Failed to reserve memory (16777216 bytes)

The output tells us that an out-of-memory condition occurred after the 247th garbage collection and that the requested amount of memory was 1048576. It also gives us the reason behind the condition (low on memory during garbage collection). Lastly, the details section tells us that it failed to reserve 16777216 bytes of memory, which corresponds to the smallest segment size on the small object heap.

In our example, it’s quite clear why the out-of-memory condition occurred, but there may be other reasons such as the CLR attempting to allocate internal data structures or other components throwing out-of-memory exceptions.

Background Garbage Collection

Prior to CLR 4.0, the garbage collector could work in two different modes. The first mode is known as the workstation mode (or concurrent GC) and targets applications running on workstations such as UI applications. The second mode, known as server mode (or blocking GC), targets server-side applications that typically do not require any UI. The reason behind having two modes lies primarily in the response time while a garbage collection occurs. After a garbage collection is underway, the execution engine and associated managed threads must be periodically suspended to avoid triggering another garbage collection. This suspension of managed threads can obviously create a short pause that can manifest itself to users of the application. In the case of workstation type of applications with a UI, this can result in the UI flashing or other subtle nuances such as lag times between a user click and the action associated with the click. In these cases, it is crucial that the amount of time that threads stay in a suspended mode be as small as possible. The concurrent (or workstation) GC accomplishes this by only suspending all the managed threads twice during a GC rather than throughout the entire duration, as is the case with the server GC. During the time that the managed threads are not suspended, they are allowed to keep allocating memory up until the end of the ephemeral segment. If the ephemeral segment is exhausted while a concurrent GC is underway, the managed threads are suspended until the concurrent GC completes (turning the concurrent GC into a blocking GC). In essence, this means that as long as the ephemeral segment is not exhausted, lag times can be avoided.

On the other hand, the server GC doesn’t have to worry about immediate response times as much as the workstation GC, because most server applications don’t have the need for immediate response times (such as in the case of UI applications). Instead of allowing allocations to occur while the GC is working, the server mode GC keeps all managed threads suspended throughout the duration of a GC. Although this may result in a nonvisible lag time while a GC is underway, the benefit of a server GC is higher throughput primarily due to not having to worry about other managed threads working at the same time. Additionally, in a server GC, each processor has a dedicated GC thread which, in turn, means that we can have X GCs happen at the same time (where X is number of processors* number of cores).

One of the primary drawbacks with the existing concurrent GC is that it works really well with applications that have relatively small-sized managed heaps (remember that as long as the ephemeral segment is not exhausted, managed threads can keep allocating memory). In today’s world, it is not uncommon to have applications whose managed heaps are in the gigabytes. In these cases, lag times can still be experienced since the concurrent GC becomes a blocking GC when the ephemeral segment limit has been reached. To address this deficiency, CLR 4.0 replaces the concurrent GC with what is known as the background GC. The biggest difference between a concurrent GC and a background GC is that the background GC allows for a full GC and allocations to happen at the same time as well as allowing a collection of generation 0 and 1. The background GC periodically checks to see if a concurrent allocation resulted in a GC in an ephemeral segment and if so, suspends itself and allows the ephemeral GC to take place (foreground GC). This means that although a full GC is taking place, we still have the capability to get rid of dead short-lived objects. Because the background GC allows for collections in generation 0 and 1, what happens if the threshold of the ephemeral segment is reached? In this case, the foreground GC grows the ephemeral segment as needed.

To summarize, server GCs always block throughout the duration of a GC. To avoid lag times during this suspension, the workstation GC was introduced, which minimizes the time threads spend in the suspended state during a GC. Although this approach works well with applications that have a relatively small managed heap foot print, lag times can still be observed if the ephemeral segment is exhausted. This deficiency led to an evolution of the workstation GC called the background GC, which allows for true concurrent allocations and ephemeral collections (and segment expansion if needed).

Please note that in CLR 4.0, the background GC is only available in workstation mode.